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TOP-C (Task Oriented Parallel C) is a freely available package for parallel computing. It is designed to be easy 
to learn and to have good tolerance for the high latencies that are common in commodity networks of computers. 
It has been successfully used in a wide range of examples, providing linear speedup with the number of computers. 
A brief overview of TOP-C is provided, along with recent experience with cosmic ray physics simulations. 



1. Introduction 

Ultra high energy cosmic rays are observed in- 
directly through detection of the extensive air 
showers that are produced when they travel 
through the atmosphere. To adequately interpret 
the measured observables and to be able to infer 
the properties of the incident primary particle, a 
full Monte Carlo treatment of the extensive air 
shower is neeeded. The CPU time required rises 
with the primary energy. For example, for pri- 
mary energies around 10^*^ eV a shower contains 
about 10^^ secondary particles. The amount of 
computing time required to follow all the parti- 
cles seems to be prohibitive. Traditionally, sam- 
pling techniques are used to reduce the number 
of particles tracked ||l| . 

In this article we describe an ongoing program 
to use commodity parallel computing for fast 
Monte Carlo simulations [||. The aim is to go 
beyond the simple event-level parallelism which 
is commonly used today and actually run indi- 
vidual events faster than would be possible on a 
single workstation or PC. 
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2. GEANT4 

For a variety of reasons, in no small part 
driven by the wish to work with software which 
is likely to see use in the future, we decided to 
try to parallelize GEANt4 the C-|— I- rewrite of 
the older (fortran77) GEANtS. GEANt4 is an 
object-oriented simulation package that provides 
general-purpose tools for defining and simulating 
detector geometry, material properties, particle 
transport and interactions, visualization, and all 
relevant physics processes. Its versatility allows it 
to be employed in applications beyond its tradi- 
tional usage in High Energy Physics experiments, 
from the medical and biological sciences to Cos- 
mic Ray Physics jj]. 

3. TOP-C 

TOP-C (Task Oriented Parallel C) ^ was ini- 
tially designed with two goals in mind: 

1. to provide a framework for easily developing 
parallel applications; 

2. to build in the ability to tolerate the high la- 
tency typically found on Beowulf clusters. ^ 

The package is freely available Q. The same ap- 
plication source code has been run under shared 

^The term "Beowulf cluster" refers to a cluster of systems 
running Linux and connected by ethernet. 
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and distributed memory (SMP, IBM SP-2, NoW, 
Beowulf cluster). A sequential top-C library is 
also provided to ease debugging. The largest test 
to date was a computer construction of Janko's 
group over three months using approximately 100 
nodes of an IBM SP-2 parallel computer at Cor- 
nell University 0. 

The TOP-C programmer's model is a master- 
slave architecture based on three key concepts: 

1. tasks in the context of a master/slave archi- 
tecture; 

2. global shared data with lazy updates; and 

3. actions to be taken after each task. 

Task descriptions (task inputs) are generated on 
the master, and assigned to a slave. The slave ex- 
ecutes the task and returns the result to the mas- 
ter. The master may update shared data on all 
processes. Such global updates take place on each 
slave after the slave completes its current task. 
The programmer's model for top-C is graphically 
described below. 
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Che ckTaskRe suit >\ ([if action REDO) 
input , output) 
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if action == UPDATE) 



UpdateSharedDataCinput , output) 



TOP-C Programmer's Model 
(Life Cycle of a Task) 
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4. Parallelization of GEANt4 Using top-C 

The task-oriented approach of top-C is ideally 
suited to parallelizing legacy applications. The 
tactic in parallelizing GEANt4 was to perturb the 
existing software as little as possible and to mod- 
ify just the section of the code which handles par- 
ticle tracking and interaction (a frequent opera- 
tion) to allow it to run on multiple CPU's. The 
largest difficulty was in marshalling and unmar- 
shalling the C-I--I- GEANt4 track objects that had 
to be passed to the slave processes. Marshalling 
is the process by which one produces a represen- 
tation of an object in a contiguous buffer suitable 
for transfer over a network, and unmarshalling is 
the inverse process. 

We developed a 6-step software methodology 
to incrementally parallelize GEANt4, allowing us 
to isolate individual issues. The six steps were: 

1. the use of .ice (include) files to isolate the 
code from the original GEANt4 code; 

2. collecting the code of the inner loop in a sep- 
arate routine, DoTaskO, whose input was 
a primary particle track, and whose output 
was the primary and its secondary particles; 

3. marshalling and unmarshalling the C++ 
objects for particle tracks 

4. integrating the marshalled versions of the 
particle tracks with the calls to DoTaskO; 

5. adding calls to top-C routines such 
as TOPC_init(), TOPC_submit_task_input 
and then testing as the marshalled particle 
tracks were sent across the network; 

6. and finally adding Che ckTaskRe suit () , 
which inspected the task output, and added 
the secondary tracks to the stack, for later 
processing by other slave processes. 

Prior to the fifth step, all debugging was in 
a sequential setting. The maturity of the TOP-C 
library then allowed us to create fully functioning 
parallel code in less than a day. 
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5. Discussion 



ing Commodity Hardware and Task Oriented 



GEANT4 (approximately 100,000 lines of C++ 
code) was successfully parallelized using top-C. 
In the future we plan to perform timing tests on a 
long run using many processors. Initial results for 
the example described indicate that a single task 
in our application requires approximately 1 ms 
of CPU time. Hence, it will be essential to sub- 
mit approximately 100 particles for a single slave 
process to compute in order to overcome network 
overhead. Optimization of the parallel implemen- 
taion is underway. 

TOP-C seems to be well-suited to the problem 
of parallelizing GEANt4, and would likely be well- 
suited to other high energy physics and cosmic 
ray applications as well. Its flexibility and sim- 
plicity makes it possible to envision enormous 
speedups for GEANt4 within a single event, some- 
thing not often considered in high energy exper- 
iments, but offering advantages over the usual 
event-by-event parallelism, especially during in- 
teractive data analysis and code or hardware de- 
sign. 

Of particular interest is the parallelization of 
existing cosmic ray simulation programs such as 
AIRES (D and CORSIKA [||. Although written in 
FORTRAN, such programs are in fact often con- 
verted to C for compilation using f2c, and can cer- 
tainly be linked with other C programs, so we an- 
ticipate no major obstacles. We are always inter- 
ested in collaboration with other groups who may 
have needs for the speedups that our methodol- 
ogy offers. 
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